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Abstract 

Quantum algorithms may be described by sequences of unitary transformations 
called quantum gates and measurements applied to the quantum register of n quan- 
tum bits, qubits. A collection of quantum gates is called universal if it can be used 
to construct any n-qubit gate. In 1995, the universality of the set of one-qubit gates 
and controlled NOT gate was shown by Barenco et al. using QR decomposition of 
unitary matrices. Almost ten years later the decomposition was improved to include 
essentially fewer elementary gates. In addition, the cosine-sine matrix decomposi- 
tion was applied to efficiently implement decompositions of general quantum gates. 
In this chapter, we review the different types of general gate decompositions and 
slightly improve the best known gate count for the controlled NOT gates to f§4 n 
in the leading order. In physical realizations, the interaction strength between the 
qubits can decrease strongly as a function of their distance. Therefore, we also dis- 
cuss decompositions with the restriction to nearest-neighbor interactions in a linear 
chain of qubits. 



1 Introduction 

The emerging of quantum mechanics [1] in the beginning of the 20 th century rev- 
olutionized the field of physics bringing not only understanding to fundamental 
concepts such as atomic and particle physics, but also numerous applications for ev- 
eryday life. One of the most important applications are the semiconductors, namely 
the transistor which is the basis of today's digital computers. As quantum me- 
chanics shook up physics, quantum computing [2] has done the same for computer 
science. Some quantum algorithms, Shor's algorithm [3] being the most famous, 
offer exponential speedup compared with the best known classical counterparts due 
to the phenomenon called quantum parallelism. Shor's algorithm may be used to 
break the commonly used RSA encryption for key distribution but, on the other 
hand, quantum physics also provides a secure information channel using quantum 
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cryptography [4]. Due to its powerful applications, the experimental realization of 
the quantum computer is regarded as highly important issue in physics. Similarly, 
theoretical research which lightens the burden of the experimental needs is also of 
great interest. 

In quantum computing, the algorithms are commonly described by the quantum 
circuit model [5]. It involves quantum gates, projective measurements and a register 
of n quantum bits, called qubits. In a classical computer, a bit may have only 
two distinct values usually denoted by and 1. In contrast, a qubit may be in a 
superposition of these two basis vectors, i.e., the state of the qubit is described by 
a vector in the complex space C 2 . In this space, the quantum gates correspond 
to matrices, which are unitary due to the unitary temporal evolution of any closed 
quantum system. 

Since many algorithms involve gates acting on n qubits, it is an important issue 
how these gates may be decomposed into an array of simpler gates accessible to the 
experiments. In general, we may assume that we have a collection of simple quantum 
gates, called the gate library, into which the n-qubit gates are to be decomposed. The 
gates in the gate library are called elementary gates. The library is called universal 
if any n-qubit gate has a presentation only involving gates from that library. We 
choose our library to consist of all one-qubit gates and the controlled NOT gate 
(CNOT) which are defined in Sec. 12.21 This particular library has been proved to 
be universal [6] but, actually, almost any other two-qubit gate could be chosen to 
replace the CNOT for the universality to hold [7]. However, it is feasible to work 
with the CNOT since it has a rather simple logical structure. 

The proof of the universality of our gate library [6] was, in fact, constructive but 
the number of CNOTs involved was as high as 0(n 3 4 n ). It is convenient to calculate 
the number of CNOTs and one-qubit gates separately, since CNOTs introduce inter- 
actions between the qubits and those interactions are usually much weaker than the 
interactions between a single qubit and the control fields. Hence, the experimental 
realization of the CNOT is typically a much slower process than that of a one-qubit 
gate. Already in 1995, it was shown that the circuit complexity could be reduced 
down to 0(n4 n ) [8], but until the year 2004 there was no remarkable progress on 
the decomposition of arbitrary quantum gates. Reference [9] reviews briefly the 
traditional decomposition of Ref. [6] . 

The highest known lower bound for the number of CNOTs needed to decompose 
a general unstructured quantum gate acting on n qubits is \(4 n — 3n — l)/4] [10] 
and, hence, there was an extra factor of n in the best known complexity. Finally in 
2004, being an unsolved mystery for about ten years the original construction was 
improved to yield the complexity 0(4") [11]. However, this decomposition was still 
far from the lower bound. The original gate decomposition made use of the QR 
matrix decomposition [12]. In contrast, Ref. [13] introduces the cosine-sine matrix 
decomposition 1 (CSD) [12] in this context which turned out to yield a leading order 
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complexity 4 n for the one-qubit gates and 4 n for the CNOTs. The CSD was also 
combined with a so-called quantum multiplexor (QM), a special method to simplify 
the gates, to obtain a decomposition involving 4 n /2 CNOTs and the same number 
of one-qubit gates in the leading order [15] (see also Ref. [16]). In this chapter, we 
present an improvement to the decomposition introduced in Ref. [16] to obtain the 
lowest CNOT count known to date. 

This chapter is organized as follows. In Sec. |^1 we define our notation and in- 
troduce some of the important mathematical tools. Section |3] is devoted to the 
presentation of so-called uniformly controlled gates (UCGs) and their efficient de- 
composition into elementary gates. The UCGs are the natural building blocks of 
decompositions employing the CSD. The original QR decomposition and its im- 
proved versions are discussed in Sec. 0] in contrast to Sec. [3 in which the CSD is 
studied. Finally, the local state preparation, i.e., the question how to transform any 
given quantum state into another arbitrary state, is implemented Sec. El following 
Refs. [15-17]. The state preparation may be useful if one wishes to use, for example, 
exotic inputs to algorithms. In Sec. [7J we conclude and summarize our discussions. 

2 Preliminaries 

2.1 Quantum state and unitary temporal evolution 

We consider here a quantum register consisting of n qubits and, hence, all possible 
quantum states of the system are in the Hilbert space TL := <H)f=i ^ 2 = C 2 ™, where 
the symbol <g> denotes the Kronecker product. The basis vectors for each of the 
qubits are chosen as 

|o) = (J) and l 1 > = (i)- W 

For the whole configuration space Tt, it is convenient to choose the basis vectors to 
be {|ejt)}, k = 1, . . . , N := 2 n . Here \e^) = ®j \x^), where x\ G {0, 1} and the index 
% = 1, ...,n refers to the qubit. In this basis the state vector of the system is of the 
form 

N N 
\^} = J2a i \e i ) and ^ \a,\ 2 = 1, (2) 
i=l i=l 

where the latter equality fixes the normalization of the vector. Hence, the probability 
for the system to be in a state |e*) after a projective measurement is |ai| 2 . It is also 
noted that the global phase of the state vector is unobservable and, hence, may be 
taken to unity 2 . 

2 Clearly the global phase does not affect the probabilities. Furthermore, addition of a global 
phase commutes with any unitary matrix, i.e., it has no effect on the temporal evolution of the 
system. 
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Conventionally in quantum computing, the order of the basis vectors has been 
chosen such that the values x\ essentially form the binary representation of the 
number k — 1, i.e., k = 1 + Y17=o ^ x i- We note that the order of the basis vectors 
in the computational basis can be freely chosen. We will make use of this degree of 
freedom in Sec.|I]in the context of the QR decomposition. 

The fundamental differences of the quantum computer compared with the clas- 
sical one arise from the utilization of the high-dimensional Hilbert space 7i. In 
comparison, the states accessible to a classical computer are limited to the basis 
vectors \^>) = \ei), i.e., to the states in which all of the weight factors except one 
vanish. The quantum mechanical superposition principle allows several weight fac- 
tors to be simultaneously non-zero, which renders the quantum mechanical state 
space greatly larger than the classical one. 

The temporal evolution of any quantum system is governed by the well known 
Schrodinger equation 

ih^Mt)) = HMt)), (3) 

where the Hamiltonian H of the pure quantum system is always Hermitian. This 
implies that the temporal evolution may be described by a unitary operator U(t, 0) as 
= U(t, 0)|<3?(0)). In our finite dimensional Hilbert space the unitary operator 
may be written as a unitary matrix U € SU (N) . The reason why the determinant of 
U may be taken to unity is that the global phase of the state vector has no physical 
meaning. Since the n-qubit quantum gate may be represented by a unitary matrix, 
it is reasonable that the gate decompositions may correspond to some known matrix 
decompositions and vice versa. 



2.2 Quantum circuits 

A one-qubit gate U € SU(2) acting on the k th qubit in a n-qubit register is repre- 
sented by a unitary matrix 

U = I®. . .<2> I &U &I . . . 8) I , (4) 

fc— 1 times n—k times 

For simplicity, we omit below the qubits that are operated on only by an identity 
operator. Accordingly, the matrix representation of the gate U is 

where a and b are two complex numbers satisfying \a\ 2 + \b\ 2 = 1. We fix the basis 
for the two-state system such that the generator a z of the SU (2) group is diagonal. 
Furthermore, we call the vectors corresponding to the eigenvalues 1 and -1 by |0) 
and |1), respectively. In this basis the matrix representations of generators {<Ji} are 
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called the Pauli spin matrices 

In fact, any [/ G 5C7 (2) may be written as a rotation 

U = R a (6) = e^ 6 ' 2 = / cos °- + i (a • tr) sin °- , (7) 

where the symbol 9 stands for the rotation angle around the unit vector a fixed by 
U and we have introduced the product a • cr = a x a x + a y o y + a z a z . Equation ((7J) 
yields that any rotation R a (9) can be made diagonal as 

R a {&) = V a R z {6)Vl (8) 

where the similarity transformation V a diagonalizes the matrix a - cr. We note that 
the matrix V a does not depend on the rotation angle 6. In addition, all rotations 
about any single axis are additive 

R a (ei)R a (e 2 ) = R a (e! + e 2 ), (9) 

and the rotation angle of all rotations with a x = is reversed by conjugation with 
a x as 

a x = a x R a (9)a x = R a (-9). (10) 

The rotations for which the rotation vector is parallel to any of the coordinate 
axes are called elementary rotations and denoted by R x (6), R y (0) and R z (0)- Any 
element U £ SU(2) may be written using only two different types of elementary 
rotations, e.g., z and y rotations as 

U = R z (a)R y ((3)Rz(l), (11) 

where angles ct,j3, 7 are called the Euler angles. The above results are used in the 
next sections to achieve and simplify the studied gate decompositions. 

The circuit diagram for the one-qubit gate U is shown in Fig. ^a). The only 
two-qubit gate in out library is the CNOT shown in Fig. ^b). The action of the 
CNOT is logical NOT in the subspace {|10}, |11)} and it leaves the subspace where 
the value of the control qubit (the upper qubit) is zero untouched. The matrix 
presentation for the CNOT in basis {|00), |01), |10), |11)} is 



C^cnot = I ®(J X 



( 1 \ 

10 

1 

V 1 / 



(12) 



In general, the qubits are denoted by horizontal lines in the quantum circuit dia- 
grams and the gates as rectangles. The control nodes are marked by circles which 
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(b) 



(c) 



(d) 



Figure 1: Quantum circuit symbols for (a) one-qubit gate, (b) CNOT, (c) controlled 
one-qubit gate, (d) twofold controlled two-qubit gate. In (c) the gate W acts only 
on the subspace in which the control qubit lies in the state |0) and in (d) the gate 
V operates on subspace in which the control qubits are in the state |10). 

are connected to the associated gate by a vertical line. The effect of the control 
nodes is to limit the corresponding gates to act only on the subspace characterized 
by its control nodes. The nodes in the quantum circuit diagram can be black or 
white corresponding to the control qubit states |1) or |0), respectively (see Figs.^c) 
and (d)). Hereafter we refer to the £;-fold controlled one-qubit gate V by C k V. When 
applied to an n qubit register, this gate operates in 2™ _fc -dimensional target sub- 
space consisting of those basis vectors for which the values of the control qubits 
match with those of the control nodes. 

3 Uniformly controlled gates 

3.1 Decomposition of uniformly controlled elementary rotations 

Sequences of consequent controlled gates with slightly different control node con- 
figurations often appear in quantum circuit diagrams. Let us call a sequence of 2 k 
gates, each having a different sequence of k control nodes, a uniformly controlled U 
gate, see Fig. |2j The gate shown acts on the whole ra-qubit register and, hence, it 
has m = n — k target qubits denoted by the set T. Let us denote a gate of this kind 
by the symbol F| (U{2 m )). 

The concept of uniformly controlled gates with efficient gate implementation 
was for the first time introduced in Ref. [13] in the context of uniformly controlled 
rotations. It has also been utilized in decompositions of general n-qubit gates [13,15, 
16,18], and in preparation of quantum states [15-17]. Bullock et al. have generalized 
uniformly controlled gates for a quantum register which is built of qudits, d- level 
(d > 2) quantum systems [19]. The methods to implement uniformly controlled z 
rotations are also closely related to the earlier work by Bullock and Markov [20]. 

Let us construct an elementary gate circuit for a uniformly controlled one-qubit 
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Figure 2: fc-fold uniformly controlled m-qubit gate, F^(U(2 m )), stands for a se- 
quence of A;- fold controlled gates Each of the gates acts on the set of target 
qubits T. Here Ui € U(2 m ), where i = 1, . . . , 2 k . 
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Figure 3: Implementation of a uniformly controlled one-parameter rotation R a 
(a x = 0) using the elementary gates. 



gate. We present the decomposition of uniformly controlled one-parameter rotations, 
Ft (-Ra)j separately since they require less gates to implement compared with general 
Ft (U(2)) gates. In a gate FJ* (R a ) the rotation angles vary, but the rotation axis is 
the same for each of the subrotations. In the spirit of Eq. we may assume that 
the fixed axis a is perpendicular to x axis and, hence, we may employ Eq. (|lUj) in 
the calculations. 

Figure El shows how to decompose a gate (R a ) into two CNOTs and two 
elementary rotations. For the states with the the control qubit in state |0) the 
CNOTs are inactive and using Eq. @ the rotation angles of the rotations -R a (^y^) 
and i? a (^^) may be added to obtain the correct rotation R a (a). For control qubit 
sates |1) the rotation i? a (^^) is negated according to Eq. (fTU)) and the resulting 
gate is R a ((3). By adding qubits and control nodes we obtain the general step to 
eliminate control nodes from the uniformly controlled rotations as shown in Fig.^Ja). 



8 



Mikko Mottonen and Juha J. Vartiainen 




Figure 4: Decomposition of a uniformly controlled one-qubit gate, (a) One param- 
eter rotation, (b) general one-qubit gate U € SU{2). 



3.2 Decomposition of uniformly controlled one-qubit gates 

To justify the control node elimination shown in Fig. Efb), we need to introduce so- 
called constant quantum multiplexor. The idea is that a onefold uniformly controlled 
rotation is decomposed as 




where o, b, u and v are unitary and r and d are diagonal unitary 2x2 matrices. Here 
a and b are fixed by the uniformly controlled gate we are implementing, u and v 
correspond to the resulting one-qubit gates and the uniformly controlled z rotation 
corresponding to matrix r is to be tuned such that the diagonal matrix d separating 
the one qubit gates is independent of a and b. 
Equation (|13j) yields the matrix equations 

a = r^udv, (14) 

b = rud^v (15) 

or, equivalently, 

X := ab ] = r ] ud 2 u ] r\ (16) 

v = dvJr'b = d)v)ra. (17) 

Equation (|16l) may be recast into a form reminiscent of an eigenvalue decomposition: 

rXr = ud 2 v) =: uAvt. (18) 

Note that X is fixed by the matrices a and b, but r can be chosen freely. By diago- 
nalizing the matrix rXr, we find the similarity transformation u and the eigenvalue 
matrix A = d 2 . The matrix v is obtained by inserting the results into Eq. (1171) . 



Decompositions of general quantum gates 



9 



D 



H 



-©- 



H 



R, 



Figure 5: Elementary gate sequence for the D gate, where H is the Hadamard gate 
and R z = R z (tt/2). Gate P = e - * 71 "/ 4 is an adjustment of the global phase. 



Since X E U(2), we may express it using the parametrization 



X 



X\ X 2 \ ^0/2 
-Xl Xl 



(19) 



where |xi| 2 + \x2\ 2 = 1 and det(X) = . The characteristic polynomial of the 
matrix rXr is 



det(rXr - XI) = A 2 - A (rf Xl + r|xi) e^ /2 + rfr|e^. 
Let us fix the freely tunable matrix r to be 

rg(xi)] \ 



( %_ r 7T 4>_ 

g2 L 2 2 " 



eMS-f+^ai) 



(20) 



(21) 



which implies the matrix d to be, indeed, independent of the matrices a and b. 
Namely 

A = ^=( 6lf _ e<f )- (22) 

Hence, the diagonal multiplexing gate obtains the fixed form D = e*4 CTz ®' Jz , 
which can be realized straightforwardly using an Ising-type Hamiltonian or, alter- 
natively, it can be decomposed into a CNOT and one-qubit gates as shown in Fig. [5J 
The single qubit gates acting on the bottom qubit in Fig. |S] may be merged 
with the adjacent single qubit gates u and v resulting in single qubit gates u' and 
v' shown in Fig. El respectively. The z rotation acting on the top qubit in Fig. [3 
may be correspondingly merged with the uniformly controlled z rotation in Fig. 
and, hence, we have justified elimination of the control node for a onefold uniformly 
controlled one-qubit gate shown in Fig. By adding qubits with control nodes 
we obtain the general step to eliminate control nodes from the UCGs as shown in 
Fig. lib). 

We note that the uniformly controlled rotations in Fig. Ufa) have the same 
rotation axis and, hence, commute. Thus the first uniformly controlled rotation 
may be, as well, transfered to be the last gate. We call this procedure mirroring 
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Figure 6: Constant quantum multiplexor for two qubits. 




r x -0- r 2 — ©- r 3 r 4 r 5 r 6 r 7 |h£-| r 8 | — & 



Figure 7: Quantum circuit realizing a threefold uniformly controlled (a) one- 
parameter rotation, (b) general one-qubit gate. In (a) {r^} stand for a one-parameter 
rotations and in (b) {u{} are general one-qubit gates. Here the gate A4 corresponds 
to a diagonal 16 x 16 unitary matrix. 



the circuit. By using the step of Fig. Ufa) recursively and mirroring every second 
outcome of the recursion we obtain the full decomposition of F± (R a ) using only 2 k 
one-qubit rotations R a and the same number of CNOTs. An example of the case 
k = 3 is shown in Fig. [7{a). When decomposing general one-qubit UCGs, the step 
in Fig. \MJ°) is to be used recursively. There we have to keep in mind that, actually, 
the CNOT may be taken to be the diagonal gate D show in Fig. [5J Hence, when 
the recursion is applied always on the leftmost UCG, all the resulting uniformly 
controlled z rotations may be merged with the adjacent UCGs except the rightmost 
ones which pile on to form a diagonal matrix A4. The decomposition of (U{2)) 
is shown in Fig. Efb). 

In general, the decomposition of a gate (U(2)) includes an alternating se- 
quence of 2 k one-qubit gates and 2 k — 1 CNOTs which we denote by F k (U(2)). 
Likewise, the implementation involves a cascade of k distinct uniformly controlled 
z rotations which corresponds to a single diagonal (fe + l)-qubit gate A^+i- However, 
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Figure 8: CNOT cascade which can be efficiently implemented using nearest- 
neighbor CNOTs [21]. 



the implementation of this part of the gate sequence can often be circumvented by 
merging it with the adjacent gates as shown in Sec. 03 In fact, if the qubit register 
is measured as such after the action of the gate ({7(2)), the diagonal gate may 
be left unimplemented since it does not change the probability amplitudes. 



3.3 Nearest-neighbor decompositions 

In the practical realization of a quantum computer, the spatial arrangement of qubits 
or other reasons may limit the interactions between the qubits. Let us consider a 
quantum register whose topology corresponds to that of a linear chain and which 
allows the gates to act only on nearest-neighbor qubits. This topology turns out to be 
amenable for implementing a uniformly controlled gate, which may have important 
consequences for experimentally realizing quantum computing. 

The quantum circuit presented for a uniformly controlled gate can be translated 
efficiently into an array of nearest-neighbor gates. The technique is based on the cir- 
cuit identity shown in Fig. |HJ The strategy is to modify the decomposition shown in 
Fig. |1] by inserting an identity in the form of a CNOT cascade and its inverse, a sim- 
ilar cascade, into the circuit next to each CNOT. The inverse cascades are absorbed 
into the adjacent uniformly controlled gate. The remaining cascades, together with 
the original CNOTs, can be efficiently implemented using nearest-neighbor CNOTs 
as illustrated in Fig. |H1 The control node elimination steps for the nearest-neighbor 
implementation are shown in Fig. El 

The complexity of the nearest-neighbor implementation depends on the relative 
order of the target and control qubits, and the order in which the control qubits 
are eliminated. An efficient strategy is to first eliminate the control nodes that are 
furthest apart from the target. Furthermore, for the gates with numerous control 
nodes, it is advantageous to use a sequence of swap gates to move the target qubit 
next to the center of the chain before the operation and back after it. A swap gate 
can be realized using three consecutive CNOTs [2]. 
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Figure 9: Method for reducing a uniformly controlled gate into nearest-neighbor 
gates: (a) uniformly controlled rotation and (b) general one-qubit gate. Here the 
circuit diagrams may also be mirrored horizontally. 

Using this strategy a gate i^ n_1 (U(2)) can be implemented with at most 

w„, a ,=5 2 » +2 „-6 a -{! ;:™ d n P3> 

nearest-neighbor CNOTs. Here s = 1, . . . , [^] is the distance of the target qubit t 
from the end of the chain. Figure HOf a) depicts the resulting circuit for the case 
k = 4 and s = 1. Similar treatment for gate -F™ -1 (R a ) yields a quantum gate array 
with 

CWti, s) = -2 n + 3?i - 6s - ( 3 ' n 6V ™ (24) 
uy ' y 6 \ f , n odd v ' 

nearest-neighbor CNOTs. Figure^Jb) displays an example circuit for the case k = 4 
and s = 1. 

We note that the uniformly controlled one-qubit gate carries 3 • 2 k degrees of 
freedom, and requires roughly the same number of elementary gates for its imple- 
mentation. Thus an array of nearest-neighbor CNOTs provides an efficient imple- 
mentation for uniformly controlled one-qubit gates, and therefore for any uniformly 
controlled gate. In particular this can be utilized to efficiently implement unstruc- 
tured unitary transformations. Furthermore, the structure of the nearest-neighbor 
circuit allows several gate operations to be executed in parallel which may further 
reduce the execution time of the algorithm. 



4 QR decomposition 

Numerical matrix computation [12] is a field of mathematics that provides useful 
tools to construct and manipulate quantum gate arrays. For example, the theorem 
of QR decomposition states that for each complex matrix A there exists a unitary 
matrix Q and an upper triangular matrix R such that A = QR. Here the matrix 
Q may be a product two-level matrices called Givens rotations [22]. For unitary 
matrices A, the resulting matrix R is essentially an identity. Thus the sequence of 
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Figure 10: Implementation of threefold uniformly controlled (a) general one-qubit 
gate and (b) one-parameter rotation. Here gates {r{\ are generic rotations about 
x axis and gates {ui} belong to SU(2). The alternating sequence of CNOTs and Ui 
gates is denoted by F§ (£7(2)). The rightmost sequence of uniformly controlled z 
rotations corresponds to a single diagonal gate, denoted by A5. 



Givens rotations yields a decomposition of any unitary matrix into two-level ma- 
trices. Consequently, these two- level matrices may be decomposed into elementary 
gates as shown below. Traditionally, a technique based on this principle is employed 
in quantum computation to find the elementary decomposition of an unstructured 
unitary matrix [6,9,23]. Reference [11] presents improvements to the traditional 
construction that eventually lead to the quantum gate decomposition of minimal 
complexity 0(4 n ). 

Let us outline how to find the sequence of Givens rotations, the product of whom 
implements any unitary matrix U 6 SU(2 n ). In the case n = 1, a Givens rotation 
G € SU{2) corresponding to a vector b = (b\ &2) T may be defined as 



Gb 



H 2 + 



-62 



61 



(25) 



For general number of qubits n, a Givens rotation is a two-level matrix acting non- 
trivially only on a subspace spanned by two basis vectors, for example, |e,-) and 
|efc). When a Givens rotation is used to nullify elements of a matrix U £ SU(N) 
we also need to specify the column which is used as the vector corresponding to 
the rotation. Hence, we define a Givens rotation l Gj± to be a two-level complex 
matrix which selectively nullifies the element on the i th column and the j th row of 
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the matrix U against the element on the i th column and the k th row. For example 



1 Gn. 



N-l 



U 



UN -2,1 V-N-2,2 
UN-1,1 UN-1,2 
U N o 



UN-2,N 
UN-l,N 
UN,N 



(26) 



where the elements of U that differ from those of U are indicated with the tilde. 

Applying Gn~i,n-2 to the modified matrix U we can nullify the element njv-i,i 
and similarly the whole first column, except the diagonal element. The unitarity of 
the matrix U fixes its absolute value to unity and the definition of a Givens rotation 
in Eq. (|25[) assures that the phase of the diagonal element vanishes, i.e., it obtains 
value 1. The further application of the method to the columns from 2 to N — 1 
results in an identity matrix as 




'G 



'jj-i 



U = I, 



(27) 



where the product of the non-commuting matrices is taken from left to right as 
always in this chapter. Equation (|27|) yields the factorization of the arbitrary matrix 
U € SU(2 n ) using Givens rotations 



U 



2 n -l 2 n -i 

n it 



G 



t 

2 n -j+l,2 n -j I ' 



(28) 



which introduces an implementation of an arbitrary quantum gate provided that an 
elementary gate presentation of each of the Givens rotations is known. We note the 
non-zero off-diagonal elements of % Gj ^ by 2 x 2-matrix *Ij 

In the first presentation of the QR decomposition for arbitrary quantum gates [6] , 
the basis vectors were ordered using standard binary coding. Thus the Givens 
rotations acting on adjacent basis vectors do not directly correspond to any known 
gate. However, if the basis vectors are permuted before the action of every rotation 
and permuted back after the action, the rotations may be written as fully controlled 
one-qubit gates. The permutation for each (9(4 n ) rotations needed of the order of n 
fully controlled NOT gates each of which required of the order of n 2 CNOTs. Hence, 
the complexity of the whole decomposition turned out to be 0(n 3 4 n ). 

Instead of labelling the basis vectors using standard binary coding, the binary 
reflected Gray code was employed in Ref. [11]. The special property of any Gray code 
ordered basis is that only one bit changes between the adjacent basis vectors \ei) 
and |ej+i), see Fig. 111( a). The important consequence of this is that the operations 
limited to the subspace spanned by |e,) and |ej+i) take the form of a C n ~ l V gate, 
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i Gray code Y(0 




(a) (b) 



Figure 11: (a) Four bit Gray code. White squares stand for bit values and black 
squares denote 1. (b) The number of control nodes needed for the Givens rotation 
nullifying the elements of the matrix U. The width of the line s between the matrix 
elements represents the number of control nodes which may be eliminated. 



where V S SU(2). Consequently, each of the 2 n_1 (2 n — 1) Givens rotations { l Gjj-i} 
can be implemented using only one C n ~ 1 V gate and no basis permuting gates are 
needed between them. Since a C n ~ 1 V gate may be decomposed into 0{n) CNOTs [6], 
the decomposition has a complexity 0(n4 n ) at this point. We note that actually we 
may, as well, label the basis vectors using the standard binary coding but, instead, 
the order in which the elements of the matrix U £ SU(2 n ) are nullified must be 
chosen such that the Givens rotations operate non-trivially only to basis vectors 
with binary presentations differing only in one bit. Provided that the basis vectors 
are labelled using the standard binary coding, the matrices 2 ™ — i Gj j—i in Eq. 1)27(1 
become i^^Gy^-yCj-i), where the function gives the integer value of the i th 
element in the binary reflected Gray code, see Fig. 1111 

Furthermore, we find that only a small fraction of the control nodes in the 
fully controlled one-qubit gates appears to be essential for the final result of the 
decomposition. If s control nodes are removed from a C n_1 ( i r» gate, the matrix 
representation l Gj of such an operation is no more two-level, but rather 2 s+1 -level, 
i.e., the matrix l Gj operates with the matrix to all pairs of basis vectors 

which satisfy the remaining control conditions. Once some element of the matrix U 
we are decomposing becomes zero in the diagonalization process, we must remove 
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Figure 12: Quantum circuit equivalent to an arbitrary three-qubit quantum gate 
U € SU(8). The control nodes indicated with a black square on the upper right 
hand side corner are superfluous and may be omitted to decrease the complexity of 
the decomposition. 



control nodes from the following fully controlled gates in such a way that the zeroed 
element does not mix with the non-zero elements. 

Figure lllf b) illustrates the determination of the number of control nodes nec- 
essary in the diagonalization. The total number of gates in the implementation 
depends on the number of the control nodes in each of the involved gates. Let us 
denote by g n (k) the number of C k V gates requires for the whole diagonalization 
process of an n-qubit gate. In Ref. [11], a recursion relation for g n {k) was derived. 
The relation has an awkward analytic solution and, therefore, it was estimated from 
above as 



9n(n-i) <2 n+ \ (29) 



Equation (|29j) shows that the number of A>fold controlled gates decreases expo- 
nentially with the number of control nodes. On the other hand, gate C k V takes 
0(n) gates to implement [6]. These results together imply that the gate array for 
an n-qubit unitary gate involves 0(4 n ) elementary gates. Figure IT2"1 shows an ex- 
ample of the quantum circuit equivalent to an arbitrary three-qubit quantum gate 
U E SU(8). 

To calculate the number of elementary gates, we use the decompositions de- 
scribed in Ref. [6]. For large n, the leading contribution to the number of CNOTs is 
approximately 8.7 x 4 n , while the upper bound from Eq. (|29|) yields approximately 
11 x 4 n . We note that neither one of the two techniques alone, the Gray code 
ordered basis vectors nor the elimination of the control nodes suffices to decrease 
the circuit complexity to 0(4 n ). As a curiosity, the technique to eliminate control 
nodes has recently been generalized and adopted again to the numerical matrix 
computation [24]. 



Decompositions of general quantum gates 



17 



5 Cosine-sine decomposition 

5.1 Recursive cosine-sine decomposition 

The CSD of a unitary 2 n x 2 n matrix may be expressed as [25] 

^(^)(^J)(^). (30) 

E/l A U 2 

where {u^} are unitary 2 n ~ 1 x 2 n_1 matrices and the real diagonal matrices c and 
s are of the form c = diag^(cos#/) and s = diag^sin^) (/ = l,...,2 n_1 ). The 
matrix A corresponds to a uniformly controlled y rotation F™ -1 (R y ) with rotation 
angles and the matrices U\ and £7 2 to uniformly controlled (n — l)-qubit gates 
(SU(2 n ~ 1 )) . By applying Eq. ()3(J|) recursively to the uniformly controlled multi- 
qubit gates until we only have uniformly controlled one-qubit gates, we obtain a 
decomposition 

U(2 n ) = Fr 1 (U(2)) J] F^ {€) (R y )F-- 1 (U(2)), (31) 

i=l 

where Q is the ruler function [26]. 

We begin to decompose the rightmost gate in Eq. I)33|) into elementary gates by 
writing an identity I = A n A* between the gates F^^^-^-i) i^v) ancl F n~ X (U(2)). 
Here we choose A such that 

Fr 1 (17(2)) = A„i>" 1 (17(2)) , (32) 

where the gate F™~ 1 (£7(2)) introduced in Sec. l3~2l needs only 2 n_1 - 1 CNOTs to 
implement. We are now left with the product 



£7(2") = F^(U(2)) 



2 n-l_ 2 

II F^(R y )F^(U(2)) 
i=l 



(33) 



x^V-i-i)) W A ^n _1 (C7(2)) , (34) 

where the product F™~^, 2n _ 1 _ ^ (Ry) A n may be written as a single uniformly con- 
trolled one-qubit gate J^L-u^ (f/(2)). Continuing to change the F n_1 (£7(2)) 

gates into F n ~ 1 (£7(2)) gates by adding diagonal gates, we finally obtain the decom- 
position 

2 «-l_ 1 

£7(2") = A^"" 1 (f7(2)) II K^ (i) (U(2))K- 1 (U(2))- (35) 

i=l 
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Figure 13: Quantum circuit for a three-qubit gate obtained using (a) the current 
CSD and (b) an alternative CSD. 



There are 

2™ _ i F n - X (U(2)) gates in Eq. (JHHJ), each of which may be decomposed 
into 2 n_1 — 1 CNOTs. In addition, we have to implement the diagonal gate A n using 
2 n — 2 CNOTs [20]. Actually, one more CNOT may be eliminated [15] and, hence, 
the current CSD requires 4 n /2 — 2 n_1 — 2 CNOTs. The circuit diagram for the CSD 
in the case n = 3 is shown in Fig. 113( a). where the diagonal gate A3 is written 
as a cascade of uniformly controlled z rotations [13]. There exists also a slightly 
different version of the CSD where the matrix U G SU(2 n ) is decomposed only into 
uniformly controlled z and y rotations [13]. An example of this alternative CSD 
in shown in Fig. I13f b). Actually, the alternative CSD is obtained also from the 
current one by writing the rightmost UCG in the product of Eq. ()33|) as a product 
F n ~ x (R z ) F n ~ l (R y ) F n ~ l (R z ). Being diagonal, the gate F n ~ l (R z ) may be merged 
into the adjacent UCG and the process can be continued until the last F n ~ l (R z ) 
arising from the leftmost UCG may be merged to the diagonal gate A n . 

5.2 Top down approach 

In addition to the CSD described above, Ref. [16] presents an alternative approach 
employing the cosine-sine decomposition. This method is called NQ decomposition 3 
and it is almost as efficient as the CSD discussed in Sec. 15. II The first step of the NQ 
method is the same as in the CSD shown in Fig. 1141 a) . see also Eq. (|30j) . However, 
the CSD step is not used recursively but, instead, the control nodes in the UCG are 
eliminated using quantum multiplexor shown in Fig. 114( b). After the application of 
the CSD and the quantum multiplexor, we are left with three uniformly controlled 
rotations separating four uncontrolled n — 1 qubit gates. Since the NQ step produces 
pure gates acting on fewer qubits it is also called a top down approach. 

Let us now motivate the validity of the quantum multiplexor. It is very similar 
to the constant quantum multiplexor presented in Sec. 13.21 but the matrices corre- 



3 NQ stands for n qubits. 
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Figure 14: Circuit diagram for (a) cosine-sine decomposition and (b) Quantum 
multiplexor. 



sponding to Eq. (|13j) are 2™ _1 x 2 n ~ 1 -dimensional, the matrix r is omitted and the 
diagonal matrix d may depend and the matrices a and b. Actually, it is an open 
problem whether there exists a constant quantum multiplexor in this general case, 
i.e., can we find diagonal r € SU(2 n ) for every X S SU(2 n ) such that the eigen- 
values of rXr are fixed. The matrix equation corresponding to Fig. I14f b) reads as 




where a, b, u and v are unitary and d is diagonal unitary 2™ 1 x 2 n 1 matrices. 
Equation (|36|) yields the matrix equations 

a = udv, (37) 

b = ud^v (38) 

or, equivalently, 

atf = ud 2 u\ (39) 

v = du ] b = d)u ] a. (40) 

By diagonalizing the matrix ab\ we find the similarity transformation u and the 
eigenvalue matrix d 2 . The matrix v is obtained by inserting the results into Eq. (|4U[) . 
Hence, we have proven the quantum multiplexor in Fig. I14f b). 

The NQ step is continued recursively to all the gates except the uniformly con- 
trolled rotations until the two-qubit level is encountered. The two-qubit gates are 
decomposed using the minimal elementary gate construction shown in Fig. 1151 In 
fact, diagonal gates commute with the control nodes of the UCG and, hence, all but 
one of the resulting two-qubit gates may be implemented up to diagonal, i.e., using 
only two CNOTs as shown in the leftmost part of Fig. 1151 

We will now calculate the number of the CNOTs involved in to NQ decomposition 
of an unstructured U G SU(2 n ). Let us denote this number by a n . Since the NQ 
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Figure 15: The minimal elementary gate construction for a two-qubit gate [10]. 

step produces four unstructured gates in SU(2 n ~ 1 ) and three F n ~ l (R) gates each 
of which may be implemented using 2 n_1 CNOTs, we obtain a recursion relation 



4a n _i + -2". 



(41) 



Using the above discussed condition 02 = 2 and adding one CNOT from the only 
two-qubit gate which needs three CNOTs for its implementation we obtain the result 



1 q 

_ 4 ™ _ ± 2 n + 1. 

2 2 



(42) 



Thus compared with the number of CNOTs from the CSD ^4 n - ^2 n - 2, the NQ 
decomposition yields the same result in the leading order. However, when we com- 
pare the number on one-qubit gates or alternatively elementary rotations, the CSD 
is found to be more efficient, see Table. ^ 



Gate type 


NQ 


CSD 


CNOT 




- p n + 1 


l 4 n _ ^ _ 2 




94™ 
24 ^ 


- |2 n + 3 

3 on 1 
2 Z 3 


A n - 1 


or SU{2) 


i4« + i2 n - n - 1 



Table 1: Comparison of the gate counts required to implement a general ra-qubit 
gate using the NQ decomposition [16] and the recursive CSD for unstructured n- 
qubit gates. 

Actually, the number of CNOTs in the NQ decomposition may be reduced by not- 
ing that the resulting uniformly controlled y rotations may be always implemented 
up to a diagonal gate as seen from Fig. lMf a). Since F k (£7(2)) gate needs one CNOT 
less to implement than F k (R y ) we obtain a recursion relation 



4a„_i + -2 n - 1, 



the solution of which is found to be 



48 



3 _ 1 
-2" + - 
2 3 



(43) 



(44) 



This result is the first known to require less than i4 n CNOTs in the leading order. 
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6 Local state preparation 

The execution of any quantum algorithm requires a certain initial state as an in- 
put. Depending on the physical realization of the quantum computer, convenient 
initialization procedures may only produce a limited range of states possibly not 
containing the desired initial state. This brings up the problem of local state prepa- 
ration 4 , i.e., how to implement the transformation of an arbitrary quantum state 
into another one. 

The configuration space of the n-qubit quantum register is 2 n -dimensional com- 
plex space. Excluding the global phase and state normalization, we find that the 
general unitary transformation transforming a given n-qubit state into another must 
have at least 2 x 2™ - 2 real degrees of freedom. Hence, in the worst-case scenario, 
the corresponding quantum circuit should involve at least 2 n+1 — 2 elementary ro- 
tations, each carrying one degree of freedom. Since each of the CNOTs can bind at 
most four elementary rotations [10], at least [|(2 n+1 — 3n — 2)] of them are needed. 
However, no quantum circuit construction embodying the minimal complexity has 
been presented in the literature. An upper bound for the number of gates needed 
for state preparation has been considered by Knill [8], who found that no more than 
0(n2 n ) gates provide the circuit implementing the transformation. More recently, 
a sufficient circuit of 0(2 n ) elementary gates was obtained in Ref. [27] (see also 
Ref. [17]) as a special case of the method developed for QR decomposition of a gen- 
eral quantum gate in Ref. [11]. In this section, we present the best known method 
to execute the local state preparation first introduced in Ref. [15]. 

Our aim is to build a fixed structure circuit which takes any given input state 
\a) n to any chosen state \b) n . We begin by noting that once we know an efficient 
circuit taking \a) n to any fixed vector, for example |ei), we may use the inverse 
that circuit with different parameters to transform \e±) to \b) n . The \a) n to \e\) n 
transformation consists of a sequence of gate pairs 

n 

S a = U [{Ft 1 W Ft 1 W) ® h^-i) ■ (45) 
i=i 

The effect of the gate pair F- -1 (R y ) F-~ (R z ) on the state \a)i is to nullify half of 
its elements: 

F?" 1 (R y ) Ft 1 (R z ) \a)i = \a')i-i ® |0) x . (46) 

Hence, each successive gate pair nullifies half of the elements of the state vector that 
have not yet been zeroed, and we have S a \a) n = |ei) n up to a global phase. 

Now we note that the pair of gates F™" 1 (Ry) F™ _1 (R z ) = F™" 1 (17(2)) may be 
replaced by the gate 

gT 1 (U (2)) = AlF^ 1 (U(2)) , (47) 

4 We use the word local to separate the state preparation discussed here from the remote state 
preparation related to quantum teleportation. 
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Figure 16: Quantum circuit for transforming an arbitrary n-qubit state vector \a) n 
into desired state vector \b) n . The resulting gates are of the form F^~ l (U(2)) which 
is efficient to implement, see Fig. [7J 



since the diagonal gate 



A! 



A 0+ , 

n—l 



+ A^ 1 ®|1><1| 



(48) 



does not mix the states; 

AlF^ 1 (U(2)) \a) n = At (|a'>„-i ® |0)i) 

= (A^laVi) ® |0)x 

= |a ,/ )«_i®|0) 1 . (49) 

After combining n—l pairs of adjacent F* +1 (R y ) F^ +l (R z ) gates where k = 1, n— 
1 we find that the entire circuit for transforming |o) to |6) requires 2 • 2 n — 2n — 2 
CNOTs and 2 • 2 n — n — 2 one-qubit gates. If \a) or \b) coincides with one of the basis 
vectors |e,), the gate counts are halved in the leading order. Figure ITH1 showns the 
circuit diagram of the whole local state preparation SlS a . 



7 Conclusion 



In this chapter we have studied efficient implementations of general n-qubit gates 
within the quantum circuit model. From the two philosophically different ap- 
proaches, the cosine-sine decomposition based methods were found to lead to smaller 
gate counts than the QR decomposition based ones. The QR decomposition, the 
CSD and the NQ decomposition are compared in the required number of CNOTs 
and the total number of elementary gates in Tables. El and El respectively. The QR 
decomposition is observed to have clearly the highest gate counts. The CSD requires 
slightly more CNOTs compared with the NQ decomposition but, on the other hand, 
the total number of elementary gates is noticeably larger in the NQ decomposition. 
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Table 2: Comparison of the number of CNOTs needed in different decompositions 
of general n-qubit gates. 



n 


1 


2 


3 


4 


5 


6 


7 


8 


9 


QR 





4 


64 


536 


4156 


22618 


108760 


486052 


2078668 


CSD 





4 


26 


118 


494 


2014 


8126 


32638 


130814 


NQ 





3 


21 


105 


465 


1953 


8001 


32385 


130305 



Table 3: Comparison of the total number of gates needed in different decompositions 
of ge neral n-qubit gates. 



n 


1 


2 


3 


4 


5 


6 


7 


8 


9 


QR 


1 


14 


136 


980 


7384 


42390 


208820 


944280 


4062520 


CSD 


1 


11 


58 


249 


1016 


4087 


16374 


65525 


262132 


NQ 


1 


10 


54 


262 


1142 


4758 


19414 


78422 


315222 



A special class of gates, called uniformly controlled gates, was introduced as basic 
building blocks of quantum circuits. In fact, the power of the gate-efficient methods 
employing the cosine-sine decomposition lies deep on the efficient implementation of 
uniformly controlled rotations and two-qubit gates. These gates also proved to be 
essential in a circuit transforming an arbitrary quantum state into another, i.e., per- 
forming local state preparation. In the case of a one-dimensional chain of qubits, the 
uniformly controlled one-qubit gates were decomposed using only nearest-neighbor 
gates, which may turn to be essential for the experimental realizations of quantum 
computers. By cleverly using the nearest-neighbor decomposition in the recursive 
CSD of an n-qubit gate, it has been shown [15] that only |4 n CNOTs are needed in 
the leading order. It is quite surprising that the gate count is increased by a factor 
of less than two, if the restriction to nearest-neighbor interactions is added. 

In conclusion, we have reviewed the development of the circuit constructions 
of arbitrary quantum gates, slightly improved the lowest known gate count for the 
CNOTs to ||4 n in the leading order, discussed the local state preparation, and the 
circuits employing only nearest-neighbor CNOTs. 
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